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[0089] where (Ad^i) is the pair of true mean intensities 
for gene i. For each i and j, the multiplicative errors e^j and 
e yij , are drawn from a bivariate normal distribution with 
means 0, standard deviations and cr ty , and correlation p e . 
The additive errors b xii and 6^., are distributed analogously, 
with parameters a^, o 6y and p & . Thus, multiplicative and 
additive errors are independent of one another but can each 
be highly correlated between x and y; in practice p e is large 
and p 6 is small. While x^ and y^ can be negative if the 
foreground is less than the estimated background for a spot, 
the true intensities and ,« yi must be non-negative. Con- 
sequently, the samples (x {j and y {j ) are described by a 
bivariate normal probability density function p with param- 
eters M*i and ^ c^, and p^^, where: 
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[0098] All stages of the optimization were performed 
using the procedure fmincon provided by Matlab and 
described by Coleman et al., Matlab Optimization Toolbox 
User's Guide (3 rd ed., Mathworks, Inc., Natick, Mass., 
1999), which was incorporated herein by reference. The 
optimization was also implemented in C code, which pro- 
duces comparable optimal parameters in substantially less 
execution time (less than 10 minutes on a Pentium III 500 
for N«6000, M«4, as compared with 4-5 hours for the 
Matlab implementation). In both cases, all parameters con- 
verged within 250 iterations of stages (2) and (3) and are 
insensitive to initial choices for p and p. 

[0099] Significance Testing using Likelihood Ratios 

[0100] After the parameters have been determined for a 
given set of observations, it is of immediate interest to use 
the model to identify mean intensity pairs which are sig- 
nificantly unequal such that /^-^ representing genes that 
are differentially expressed between the two cell popula- 
tions. For each gene i, the generalized likelihood ratio test 
(GLRT) (Kendall and Stuart 1979) statistic \ is computed 
according to: 



[0090] The model depends on six gene -independent 
parameters p=( a ex> a ey , p c , a 6x , a 6y , pj) and a mean pair per 

gene, HXflci/«yi)» (/W^z), ■ ■ • > C^nvSn) ] for a total of 
2N+6 parameters. The probability density function for gene 
i is p-p^., yjp, Mxi^yd- 

[0091] Parameter Estimation by Maximum Likelihood 

[0092] Since p and fx are generally unknown, they can be 
estimated by using a maximum likelihood estimation (MLE) 
as described by Kendall and Stuart, The Advanced Theory of 
Statistics, Volume 2 (4 th ed., Macmillan Publishing Co., 
New York, N.Y., 1979), which is incorporated herein by 
reference. Likelihood functions, for gene i and over all 
genes, are respectively defined as: 



[0093] The MLE parameter values maximizing L, desig- 
nated p and jU, are estimates for the true parameters of the 
underlying statistical model. In general, these values can be 
found using standard optimization procedures as described 
by Press et ah, Numerical Recipes in C: The Art of Scientific 
Computing (2 nd ed., Cambridge University Press, Cam- 
bridge, Mass.). Because N can be large p and fx, can be 
determined by optimizing subsets of parameters in separate 
stages: 

[0094] (1) choose initial values for it, 

[0095] (2) select p to maximize L given current values 
of a, 

[0096] (3) for i=l, . . . , N: select (u^yi) to maximize 
L^, given current values of p, and 

[0097] (4) repeat (2) and (3) until p, t u have converged. 



[0101] Two maximizations are performed: in the numera- 
tor, the constraint /i^^/iy-/* is imposed, while in the denomi- 
nator the optimization is unconstrained. Under the null 
hypothesis that^-/^, p remains a consistent estimator when 
the constraint is imposed. 

[0102] In the case that fi^fXy^ follows (asymptotically 
in M and N) a */ 2 distribution with 1 degree of freedom 
(DOF), whereas if /^-/i^ the value of \ is expected to be 
larger than would be obtained from random sampling of this 
distribution. To select differentially-expressed genes with a 
selection error of a, the false positive or Type-1- error rate, 
one would first determine the critical value X c , for which the 
X 2 cumulative probability distribution is equal to 1-a, then 
select the set of all genes i for which \ x is in the critical 
region Xi>X c . The particular choice of a depends on the 
number of genes on the array and the selection error which 
the individual investigator is willing to tolerate. 

EXAMPLE II 

Identification of Genes Differentially-Expressed in 
Response to Galactose Stimulation of Yeast Cells 

[0103] This example describes application of the math- 
ematical model of the variability observed over repeated 
observations of intensities for genes represented on a DNA 
microarray to the identification of genes differentially-ex- 
pressed in response to galactose stimulation. 

[0104] Assembly of the Microarray 

[0105] In order to explore the performance of the test for 
differentially -expressed genes as shown in Example I, Sac- 
charomyces cerevisiae cultures growing in the absence of 
galactose (YPR media) were compared to those growing in 
galactose-stimulating conditions (YPRG) using a DNA 
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2. The method of claim 1, further comprising selecting a 
mean signal jx that provides a maximum probability of 
likelihood given said observed signal. 

3. The method of claim 1, wherein said additive and 
multiplicative errors are independent with respect to each 
other. 

4. The method of claim 1, wherein said observed signal 
and said mean signal further comprises the relationship; 

where each measurement j-1, . . . , M, each analyte i-1, 
. . . , N, and where x^. is the observed signal and pi^ is 
the mean signal. 

5. The method of claim 1, wherein said additive and 
multiplicative errors further comprise a univariate distribu- 
tion. 

6. The method of claim 5, wherein said univariate distri- 
bution is a parametric distribution. 

7. The method of claim 6, wherein said parametric dis- 
tribution is a univariate normal distribution. 

8. The method of claim 7, wherein said univariate normal 
distribution and said system parameter further comprise a 
multiplicative error term consisting of a normal distribution 
having standard deviation with respect to a signal mean 
(a^J and an additive error term consisting of a normal 
distribution having standard deviation with respect to a 
signal mean (a 6x ). 

9. The method of claim 6, wherein said parametric dis- 
tribution is a t-distribution. 

10. The method of claim 6, wherein said parametric 
distribution is a gamma distribution. 

11. The method of claim 1, wherein said mean signal and 
system parameter are determined at the same time. 

12. The method of claim 1, wherein said system parameter 
is determined before said mean signal is determined. 

13. The method of claim 12, wherein said predetermined 
system parameter is used to determine said mean signal. 

14. The method of claim 1, wherein said enhanced values 
for said probability likelihood of said observed signals are 
produced one or more times until said mean signal and said 
system parameter converge. 

15. The method of claim 1, wherein said mean signal and 
said system parameter are determined by a method selected 
from the group consisting of maximum likelihood estima- 
tion (MLE), Quasi-Maximum Likelihood and Generalized 
Method of Moments. 

16. The method of claim 1, wherein determining said 
mean signal and said system parameter further comprises a 
no n- linear optimization algorithm. 

17. The method of claim 16, wherein said optimization 
algorithm is selected from the gmf consisting of Gradient 
Descent, Newton-Raphson and Simulated Annealing. 

18. A method of determining a true signal of an analyte, 
comprising: 

(a) obtaining an observed signal x for one or more 
analytes; 

(b) providing a mean signal (u) and a system parameter 
(p) for said analyte; 

(c) computing a probability likelihood of said observed 
signal, said observed signal being related to said mean 
signal by an additive error (6) and a multiplicative error 
(e), where said system parameter specifies properties of 
said additive error and said multiplicative error, and 



(d) selecting a mean signal p and a system parameter (p) 
that provides a maximum probability likelihood of 
occurrence given said observed signal. 

19. The method of claim 18, wherein said additive and 
multiplicative errors are independent with respect to each 
other. 

20. The method of claim 18, wherein said observed signal 
and said mean signal further comprises the relationship: 

where each measurement j«l, . . . , N, each analyte i-1, 
. . . , N, and where x^- is the observed signal and ^ is 
the mean signal. 

21. The method of claim 18, wherein said additive and 
multiplicative errors further comprise a univariate distribu- 
tion. 

22. The method of claim 1, wherein said univariate 
distribution is a parametric distribution. 

23. The method of claim 22, wherein said parametric 
distribution is a univariate normal distribution. 

24. The method of claim 23, wherein said univariate 
normal distribution and said system parameter further com- 
prise a multiplicative error term consisting of a normal 
distribution having standard deviation with respect to a 
signal mean (a^, and an additive error term consisting of 
a normal distribution having standard deviation with respect 
to a signal mean (ct&c). 

25. The method of . claim 22, wherein said parametric 
distribution is a t-distribution. 

26. The method of claim 22, wherein said parametric 
distribution is a gamma distribution. 

27. The method of claim 18, wherein said mean signal and 
system parameter are selected at the same time. 

28. The method of claim 18, wherein said system param- 
eter is selected before said mean signal is determined. 

29. The method of claim 28, wherein said preselected 
system parameter is used to select said mean signal. 

30. The method of claim 18, further comprising comput- 
ing said probability likelihood one or more times until said 
mean signal and said system parameter converge. 

31; The method of claim 18, wherein said mean signal and 
said system parameter are determined by a method selected 
from the group consisting of maximum likelihood estima- 
tion (MLE), Quasi-Maximum Likelihood and Generalized 
Method of Moments. 

32. The method of claim 18, wherein selecting said mean 
signal and said system parameter further comprises a non- 
linear optimization algorithm. 

33. The method of claim 32, wherein said optimization 
algorithm is selected from the group consisting of Gradient 
Descent, Newton-Raphson and Simulated Annealing. 

34. A method of determining relative amounts of an 
analyte between samples, comprising: 

(a) measuring observed signals x and y for an analyte 
within two or more sample pairs, and 

(b) determining a mean signal pair per analyte (u) and a 
system parameter (p) for each sample pair that produce 
enhanced values for a probability likelihood of said 
observed signals, said observed signals being related to 
said mean signals by an additive error (5) and a 
multiplicative error (e), wherein said system parameter 
specifies properties of said additive error (8) and said 
multiplicative error (e). 
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35. The method of claim 34, further comprising selecting 
a mean signal p that provides a maximum probability of 
occurrence given said observed signals. 

36. The method of claim 34, wherein said additive and 
multiplicative errors are independent with respect to each 
other. 

37. The method of claim 34, wherein said observed 
signals and said mean signal pair per analyte within said 
sample pairs further comprise the relationship: 

Xirfhd^^i+Kii, and 

where each measurement j equals 1 through M and each 
analyte i equals 1 through N; where x^ and y £j are the 
observed signals, and where and ju yi are the mean 
signals. 

38. The method of claim 34, wherein said additive and 
multiplicative errors further comprise a bivariate distribu- 
tion. 

39. The method of claim 38, wherein said bivariate 
distribution is a parametric distribution. 

40. The method of claim 38, wherein said parametric 
distribution is a bivariate normal distribution. 

41. The method of claim 40, wherein said bivariate 
normal distribution and said system parameter further com- 
prises a multiplicative error term consisting of a standard 
deviation with respect to a mean of signal x (aj, a standard 
deviation with respect to a mean of signal y (a ey ) and a 
correlation between signals x and y (pj, and an additive 
error term consisting of a standard deviation with respect to 
a mean of signal x (o 6x ), a standard deviation with respect 
to a mean of signal y (a^J and a correlation between signals 
x and y (pj. 

42. The method of claim 39, wherein said parametric 
distribution is a t-distribution. 

43. The method of claim 39, wherein said parametric 
distribution is a bivariate gamma distribution. 

44. The method of claim 34, wherein said mean signal 
pair per analyte and system parameter are determined at the 
same time. 

45. The method of claim 34, wherein said system param- 
eter is determined before said mean signal pair per analyte 
is determined. 

46. The method of claim 45, wherein said predetermined 
system parameter is used to determine said mean signal pair 
per analyte. 

47. The method of claim 34, wherein said enhanced 
values for said probability likelihood of said observed 
signals are produced one or more times until said mean 
signal pair per analyte and said system parameter converge. 

48. The method of claim 34, wherein determining said 
mean signal pair per analyte and said system parameter 
further comprises a non-linear optimization algorithm. 

49. The method of claim 48, wherein said optimization 
algorithm is selected from the group consisting of Gradient 
Descent, Newton-Raphson and Simulated Annealing. 

50. The method of claim 34, further comprising identi- 
fying significantly unequal mean signal pairs per analyte by 
a statistical difference indicator. 

51. The method of claim 50, wherein said difference 
indicator further comprises a generalized likelihood ratio 
test statistic (X.). 



52. A method of determining relative amounts of an 
analyte between samples, comprising: 

(a) obtaining observed signals x and y for an analyte 
within two or more sample pairs; 

(b) providing a mean signal pair per analyte (a) and a 
system parameter (p) for each sample pair; 

(c) computing a probability likelihood of said observed 
signals, said observed signals being related to said 
mean signal by an additive error (6) and a multiplica- 
tive error (e), where said system parameter specifies the 
properties of said additive error and said multiplicative 
error, and 

(d) selecting a mean signal $ and a system parameter (|3) 
that provides a maximum probability likelihood of 
occurrence given said observed signals. 

53. The method of claim 52, wherein said additive and 
multiplicative errors are independent with respect to each 
other. 

54. The method of claim 52, wherein said observed 
signals and said mean signal pair per analyte .within said 
sample pairs further comprise the relationship: 

where each measurement j equals 1 through M and each 
analyte i equals 1 through N; where x^. and y^ are the 
observed signals, and where /u^ and ^ are the mean 
signals. 

55. The method of claim 52, wherein said additive and 
multiplicative errors further comprise a bivariate distribu- 
tion. 

56. The method of claim 55, wherein said bivariate 
distribution is a parametric distribution. 

57. The method of claim 56, wherein said parametric 
distribution is a bivariate normal distribution. 

58. The method of claim 57, wherein said bivariate 
normal distribution and said system parameter further com- 
prise a multiplicative error term consisting of a standard 
deviation with respect to a mean of signal x (oj, a standard 
deviation with respect to a mean of signal y (a ey ) and a 
correlation between signals x and y (pj, and an additive 
error term consisting of a standard deviation with respect to 
a mean of signal x (a^J, a standard deviation with respect 
to a mean of signal y (a 6y ) and a correlation between signals 
x and y (pj. 

59. The method of claim 56, wherein said parametric 
distribution is a t-distribution. 

60. The method of claim 56, wherein said mean signal 
pair per analyte and system parameter are determined at the 
same time. 

61. The method of claim 52, wherein said system param- 
eter is determined before said mean signal pair per analyte 
is determined. 

62. The method of claim 61, wherein said predetermined 
system parameter is used to determine said mean signal pair 
per analyte. 

63. The method of claim 52, further comprising comput- 
ing said probability likelihood of said observed signals one 
or more times until said mean signal pair per analyte and 
said system parameter converge. 
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64. The method of claim 52, wherein said mean signal 
pair per analyte and said system parameter are determined 
by a method selected from the group consisting of maximum 
likelihood estimation (MLE), Quasi-Maximum Likelihood 
and Generalized Method of Moments. 

65. The method of claim 52, wherein selecting said mean 
signal pair per analyte and said system parameter further 
comprises a noa- linear optimization algorithm. 

66. The method of cjaim 65, wherein said optimization 
algorithm is selected fenPfne group consisting of Gradient 
Descent, Newton-Raphson and Simulated Annealing. 

67. The method of claim 52, further comprising identi- 
fying said mean signal pair per analyte that are significantly 
unequal using a difference indicator. 

68. The method of claim 67, wherein said difference 
indicator further comprises a generalized likelihood ratio 
test statistic (X). 

69. The method of claim 67, further comprising selecting 
two or more mean signal pairs per analyte having a differ- 
ence indicator greater than that corresponding to a false 
positive error rate. 

70. The method of claim 52, wherein said analyte is a 
nucleic acid or polypeptide. 

71. A method of determining relative amounts of analytes 
between samples, comprising: 

(a) obtaining observed signals x and y for a plurality of 
immobilized analytes within two or more sample pairs; 

(b) determining a mean signal pair per analyte («) and a 
system parameter ((3) for each sample pair that provides 
a maximum probability likelihood of occurrence given 
said observed signals, said observed signals being 
related to said mean signal by an additive error (5) and 
a multiplicative error (e), where said system parameter 
specifies the properties of said additive error and said 
multiplicative error, and 

(c) identifying one or more mean signal pairs per analyte 
that is significantly unequal. 

72. The method of claim 71, wherein said additive and 
multiplicative errors are independent with respect to each 
other. 

73. The method of claim 71, wherein said observed 
signals and said mean signal pair per analyte within said 
sample pairs further comprise the relationship: 

where each measurement j equals 1 through M and each 
analyte i equals 1 through N; where x i} - and y y are the 
observed signals, and where ^ and u yi are the mean 
signals. 

74. The method of claim 71, wherein said one or more 
mean signal pairs per analyte are identified as significantly 
unequal by using a difference indicator. 

75. The method of claim 74, wherein said difference 
indicator further comprises a generalized likelihood ratio 
test statistic (X). 

76. The method of claim 74, further comprising selecting 
two or more mean signal pairs per analyte having a differ- 
ence indicator greater than that corresponding to a false 
positive error rate. 

77. The method of claim 71, wherein said analyte is a 
nucleic acid or polypeptide. 



78. The method of claim 71, wherein said plurality of 
analytes further comprises about 1,000 or more different 
analytes. 

79. The method of claim 71, wherein said plurality of 
analytes further comprises about 10,000 or more different 
analytes. 

80. The method of claim 71, wherein said plurality of 
analytes further comprises about 30,000 or more different 
analytes. 

81. The method of claim 71, further comprising analytes 
mobilized on a microarray. 

82. The method of claim 71, further comprising the steps 
of: 

(a) obtaining one or more reference signals, and 

(b) determining a mean signal pair (f*) and a system 
parameter (p) for a sample pair comprising said 
observed signal x or y and said reference signal that 
provides a maximum probability likelihood of occur- 
rence given said reference and observed signals, said 
reference and observed signals being related to said 
mean signal by an additive error (6) and a multiplica- 
tive error (e), wherein said system parameter specifies 
the properties of said additive error and said multipli- 
cative error. 

83. A method of determining relative amounts of an 
analyte between samples, comprising: 

(a) obtaining a reference signal; 

(b) obtaining observed signals x and y for an analyte 
within two or more sample pairs; 

(c) determining system parameters (p lf pj) for a sample 
pair comprising said observed signals x or y and said 
reference signal that provide a probability likelihood of 
said occurrence given said observed and reference 
signals, said observed and reference signals being 
related to said mean signal by an additive error (5) and 
a multiplicative error (e), where said system parameter 
specifies the properties of said additive error and said 
multiplicative error; 

(d) determining mean signal pairs Qi lt ,u^ for said sample 
pair comprising maximizing a product of terms for said 
probability likelihood of said sample pair of observed 
signals x or y and said reference signal for said analyte, 
and 

(e) selecting a mean signal ^ or ^ that provides a 
maximum probability likelihood of occurrence given 
said observed signals and system parameters p a and p 2 . 

84. The method of claim 83, wherein said mean signal 
pairs (Wi»/*2) are determined using p a and p 2 obtained from 
step (c). 

85. A method of determining relative amounts of an 
analyte between samples, comprising: 

(a) measuring observed signals x, y and z for an analyte 
within two or more sample sets, and 

(b) determining a mean signal set per analyte (/*) and a 
system parameter (p) for each sample set that produce 
enhanced values for a probability likelihood for said 
observed signals, said observed signals being related to 
mean signals by an additive error (5) and a multipli- 
cative error (e). 

***** 



